Improving out-of-vocabulary name resolution
نویسندگان
چکیده
This paper presents algorithms for generating targeted name lists for candidate out-of-vocabulary (OOV) words for applications in language processing, particularly speech recognition. Focusing on names, which are shown to be the dominant class of OOVs in news broadcasts, the approach involves offline generation of a large name list and online pruning based on a phonetic distance. The resulting list can be used in a rescoring pass in automatic speech recognition. We also show that a simple variation of the approach can be used to generate alternate name spellings which may be useful for query expansion in information retrieval. By using a wide variety of sources, including automatic name phrase tagging of temporally relevant news text, OOV coverage can be improved by nearly a factor of two with only a 10% increase in the word list size. For one source, coverage increased from 13% to 94%. Phonetic pruning can be used to reduce the list size by an order of magnitude with only a small loss in coverage.
منابع مشابه
About improving recognition of spontaneously uttered French city-names
This paper deals with the recognition of French city-names over the telephone. This recognition task, critical in many applications, involves a 40,000 city-name vocabulary, ranging from short monosyllabic words to long official compoundnames. Data collected from a field experiment are analyzed, and several ways of improving speech recognition performance are investigated. This includes a carefu...
متن کاملArabic - to - English Translation for IWSLT 2006
We present techniques for improving domainspecific translation quality with a relatively high OOV ratio on test data sets. The key idea is to maximize the vocabulary coverage without degrading the translation quality. We maximize vocabulary coverage by segmenting a word into a sequence of morphemes, prefix*-stem-suffix* and by adding a large amount of out-of-domain training corpora. To preserve...
متن کاملPlacename Ambiguity Resolution
It is common for placenames to reference other named entities (e.g. names of people, names of organizations, etc.) and to be used as vocabulary words (e.g. city of Split). Apart from reference ambiguity, placenames are faced with the problem of referent ambiguity (i.e., a placename referring to multiple places). Many places are also referred to by multiple names (e.g. Netherlands vs. Holland). ...
متن کاملGeographical Scope Resolution
It is common for placenames to reference other named entities (e.g., names of people, names of organizations, etc.) and to be used as vocabulary words (e.g., city of Split). Apart from reference ambiguity, placenames are faced with the problem of referent ambiguity (i.e., a placename referring to multiple places). Many places are also referred to by multiple names (e.g., Netherlands vs. Holland...
متن کاملThe Relationship between Iranian Upper-Intermediate EFL Learners’ Contrastive Lexical Competence and Their Use of Vocabulary Learning Strategies
Regarding the vital role of lexical competence as an important requisite for the attainment of full mastery of the four language skills, this study tried to investigate the relationship between Iranian EFL learners’ contrastive lexical competence and their use of vocabulary learning strategies. To fulfil this objective, 60 Iranian upper-intermediate male and female language learners were select...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Speech & Language
دوره 19 شماره
صفحات -
تاریخ انتشار 2005